Duplicate record elimination in large data files
نویسندگان
چکیده
منابع مشابه
Redundancy Elimination Within Large Collections of Files
Ongoing advancements in technology lead to everincreasing storage capacities. In spite of this, optimizing storage usage can still provide rich dividends. Several techniques based on delta-encoding and duplicate block suppression have been shown to reduce storage overheads, with varying requirements for resources such as computation and memory. We propose a new scheme for storage reduction that...
متن کاملDuplicate Data Elimination in a SAN File System
Duplicate Data Elimination (DDE) is our method for identifying and coalescing identical data blocks in Storage Tank, a SAN file system. On-line file systems pose a unique set of performance and implementation challenges for this feature. Existing techniques, which are used to improve both storage and network utilization, do not satisfy these constraints. Our design employs a combination of cont...
متن کاملA knowledge-based approach for duplicate elimination in data cleaning
Existing duplicate elimination methods for data cleaning work on the basis of computing the degree of similarity between nearby records in a sorted database. High recall can be achieved by accepting records with low degrees of similarity as duplicates, at the cost of lower precision. High precision can be achieved analogously at the cost of lower recall. This is the recall–precision dilemma. We...
متن کاملModeling Uncertainty in Duplicate Elimination
Real-world databases experience various data quality problems of different causes including heterogeneity of consolidated data sources, imprecision of reading devices, and data entry errors. Existence of duplicate records is a prominent data quality problem. The process of duplicate elimination often involves uncertainty in deciding on the true duplicates. Current tools resolve such uncertainty...
متن کاملSleep Sets Meet Duplicate Elimination
The sleep sets technique is a path-dependent pruning method for state space search. In the past, the combination of sleep sets with graph search algorithms that perform duplicate elimination has often shown to be error-prone. In this paper, we provide the theoretical basis for the integration of sleep sets with common search algorithms in AI that perform duplicate elimination. Specifically, we ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: ACM Transactions on Database Systems
سال: 1983
ISSN: 0362-5915,1557-4644
DOI: 10.1145/319983.319987